Feature Weight Optimization for Discourse-Level SMT

نویسندگان

  • Sara Stymne
  • Christian Hardmeier
  • Jörg Tiedemann
  • Joakim Nivre
چکیده

We present an approach to feature weight optimization for document-level decoding. This is an essential task for enabling future development of discourse-level statistical machine translation, as it allows easy integration of discourse features in the decoding process. We extend the framework of sentence-level feature weight optimization to the document-level. We show experimentally that we can get competitive and relatively stable results when using a standard set of features, and that this framework also allows us to optimize documentlevel features, which can be used to model discourse phenomena.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantics, Discourse and Statistical Machine Translation

In the past decade, statistical machine translation (SMT) has been advanced from word-based SMT to phraseand syntax-based SMT. Although this advancement produces significant improvements in BLEU scores, crucial meaning errors and lack of cross-sentence connections at discourse level still hurt the quality of SMT-generated translations. More recently, we have witnessed two active movements in SM...

متن کامل

Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-bysentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT

متن کامل

Document-Wide Decoding for Phrase-Based Statistical Machine Translation

Independence between sentences is an assumption deeply entrenched in the models and algorithms used for statistical machine translation (SMT), particularly in the popular dynamic programming beam search decoding algorithm. This restriction is an obstacle to research on more sophisticated discourse-level models for SMT. We propose a stochastic local search decoding method for phrase-based SMT, w...

متن کامل

Improving Implicit Discourse Relation Recognition Through Feature Set Optimization

We provide a systematic study of previously proposed features for implicit discourse relation identification, identifying new feature combinations that optimize F1-score. The resulting classifiers achieve the best F1-scores to date for the four top-level discourse relation classes of the Penn Discourse Tree Bank: COMPARISON, CONTINGENCY, EXPANSION, and TEMPORAL. We further identify factors for ...

متن کامل

Manyopt: An Extensible Tool for Mixed, Non-Linear Optimization Through SMT Solving

Optimization of Mixed-Integer Non-Linear Programming (MINLP) supports important decisions in applications such as Chemical Process Engineering. But current solvers have limited ability for deductive reasoning or the use of domain-specific theories, and the management of integrality constraints does not yet exploit automated reasoning tools such as SMT solvers. This seems to limit both scalabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013